# Load libraries
import os
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
import rioxarray as rioxr
# Read in data
= os.path.join('data', 'California_Fire_Perimeters_(all).shp')
fp_perimeter = gpd.read_file(fp_perimeter) perimeter
Background:
The Thomas Fire, which burned over 280,000 acres in Ventura and Santa Barbara counties in December 2017, was one of California’s largest wildfires at the time. It caused widespread ecological damage, displaced communities, and left lasting environmental impacts.
In this analysis, I will find the perimeter of the fire, analyze spatial data of the lasting fire scars through false color images, and visualize the effects that this fire had on air quality in the Santa Barbara area.
About the data
In this task I will use historical open-access data about fire perimeters in California. There are several datasets with this information online. The dataset that I found is from data.gov at this link: https://catalog.data.gov/dataset/california-fire-perimeters-all-b3436. It was a particularly useful site, as there were multiple filetypes to choose from.
The next dataset I will use is a simplified collection of bands (red, green, blue, near-infrared and shortwave infrared) from the Landsat Collection 2 Level-2 atmosperically corrected surface reflectance data, collected by the Landsat 8 satellite. These data were retrieved from the Microsof Planetary Computer data catalogue and pre-processed to remove data outside land and coarsen the spatial resolution. (This data should be used for visualization and educational purposes only.)
Finally, I will use Air Quality Index (AQI) data from the US Environmental Protection Agency to visualize the impact on the AQI of the 2017 Thomas Fire in Santa Barbara County.
First up in my analysis:
Fire perimeter data retrieval and selection
I will find and isolate the perimeter of the Thomas Fire, using open source data. I will then be able to use the Thomas Fire perimeter data in further analysis of the effects of the fire on Santa Barbara ecology. I will save this perimeter data as a file in my repository, for independent use in other notebooks.
To begin, I will do some exploratory data analysis to get a sense of the dataset I am using. I will ensure that I know the CRS of the data, for use in further spatial data joining and analysis.
3) perimeter.head(
YEAR_ | STATE | AGENCY | UNIT_ID | FIRE_NAME | INC_NUM | ALARM_DATE | CONT_DATE | CAUSE | C_METHOD | OBJECTIVE | GIS_ACRES | COMMENTS | COMPLEX_NA | IRWINID | FIRE_NUM | COMPLEX_ID | DECADES | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2023 | CA | CDF | SKU | WHITWORTH | 00004808 | 2023-06-17 | 2023-06-17 | 5 | 1 | 1 | 5.72913 | None | None | {7985848C-0AC2-4BA4-8F0E-29F778652E61} | None | None | 2020 | POLYGON ((-13682443.000 5091132.739, -13682445... |
1 | 2023 | CA | LRA | BTU | KAISER | 00010225 | 2023-06-02 | 2023-06-02 | 5 | 1 | 1 | 13.60240 | None | None | {43EBCC88-B3AC-48EB-8EF5-417FE0939CCF} | None | None | 2020 | POLYGON ((-13576727.142 4841226.161, -13576726... |
2 | 2023 | CA | CDF | AEU | JACKSON | 00017640 | 2023-07-01 | 2023-07-02 | 2 | 1 | 1 | 27.81450 | None | None | {B64E1355-BF1D-441A-95D0-BC1FBB93483B} | None | None | 2020 | POLYGON ((-13459243.000 4621236.000, -13458968... |
# Figure out the dimensions of the dataframe
print("Shape of the data:", perimeter.shape)
# Figure out if the columns are the expected datatypes
print("Data types:", perimeter.dtypes)
Shape of the data: (22261, 19)
Data types: YEAR_ int64
STATE object
AGENCY object
UNIT_ID object
FIRE_NAME object
INC_NUM object
ALARM_DATE object
CONT_DATE object
CAUSE int64
C_METHOD int64
OBJECTIVE int64
GIS_ACRES float64
COMMENTS object
COMPLEX_NA object
IRWINID object
FIRE_NUM object
COMPLEX_ID object
DECADES int64
geometry geometry
dtype: object
# Explore data CRS
perimeter.crs
<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World between 85.06°S and 85.06°N.
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
# Find out if the CRS is projected
perimeter.crs.is_projected
True
From this data exploration, I learned that the dataset is much larger than I need, but does contain useful information in addition to the geometries, such as acres burned, cause, etc. I learned that the names of the fires are in all capitals, and that the year numbers are int64, so I can treat them as numeric values. Finally, I leaned that the CRS is WGS 84, and that is is projected data, rather than geographic.
From this fire perimeter data, I will select the Thomas Fire boundary. The fire occurred in 2017.
# Make the column names lower case
= perimeter.columns.str.lower()
perimeter.columns
# Filter data to only include the Thomas Fire boudnary in 2017
= perimeter[(perimeter['fire_name'] == "THOMAS") & (perimeter['year_'] == 2017)]
thomas
thomas
year_ | state | agency | unit_id | fire_name | inc_num | alarm_date | cont_date | cause | c_method | objective | gis_acres | comments | complex_na | irwinid | fire_num | complex_id | decades | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2654 | 2017 | CA | USF | VNC | THOMAS | 00003583 | 2017-12-04 | 2018-01-12 | 9 | 7 | 1 | 281791.0 | CONT_DATE based on Inciweb | None | None | None | None | 2010 | MULTIPOLYGON (((-13316089.016 4088553.040, -13... |
Now I will save only the 2017 Thomas Fire boundary as a GeoJSON file. The file should go into the data/ directory in my repository.
# Save the fire boundary as a file that can go into my repository
# Save the filtered GeoDataFrame as a GeoJSON file
= 'data/thomas.geojson'
path ='GeoJSON') thomas.to_file(path, driver
I chose to use a GeoJSON file format for my perimeter boundary because it is a common and useful “open format for encoding vector points and their attributes”. It comes in one file, as compared to .shp files, which have many dependencies. It requires the data be in WGS84, and since I already verified that this data is in that CRS, it seems like the best possible option for this analysis.
Visualizing fire scars through false color
False color imagery, created using satellite data from instruments like Landsat, is a useful tool for monitoring wildfire impacts. By assigning infrared bands to visible colors, these images highlight vegetation health, burn severity, and the extent of fire scars. This approach helps researchers and land managers assess recovery efforts, identify high-risk areas, and plan restoration strategies.
I will create a false color image of the Thomas Fire using remote sensing data, highlighting the fire scar and exploring how coding and data visualization support environmental monitoring.
Next up:
True Color Image
As I import the raster file to make this true color image, I add the parameter decode_coords="all"
to the code. This parameter controls how the coordinate metadata in the NetCDF file are processed. Specifically, the all
distinction decodes all coordinates in the file to easily useable xarray coordinate variables.
# Construct a file path to the Landsat data using os and import it
= os.path.join('data', 'landsat8-2018-01-26-sb-simplified.nc')
fp = rioxr.open_rasterio(fp, decode_coords="all")
sb_rast sb_rast
<xarray.Dataset> Size: 25MB Dimensions: (band: 1, x: 870, y: 731) Coordinates: * band (band) int64 8B 1 * x (x) float64 7kB 1.213e+05 1.216e+05 ... 3.557e+05 3.559e+05 * y (y) float64 6kB 3.952e+06 3.952e+06 ... 3.756e+06 3.755e+06 spatial_ref int64 8B 0 Data variables: red (band, y, x) float64 5MB ... green (band, y, x) float64 5MB ... blue (band, y, x) float64 5MB ... nir08 (band, y, x) float64 5MB ... swir22 (band, y, x) float64 5MB ...
# Explore the data
print('Shape: ', dict(sb_rast.sizes))
print(sb_rast.data_vars, '\n')
Shape: {'band': 1, 'x': 870, 'y': 731}
Data variables:
red (band, y, x) float64 5MB ...
green (band, y, x) float64 5MB ...
blue (band, y, x) float64 5MB ...
nir08 (band, y, x) float64 5MB ...
swir22 (band, y, x) float64 5MB ...
This data exploration shows me that there is only one band on this raster. This means that including it is redundant, and the band will need to be removed. I also learned that each color band is a float64 integer, which is good to know when doing analysis.
# Drop the band dimension of the data
# Original dimensions and coordinates
print(sb_rast.dims, sb_rast.coords, '\n')
# Remove length 1 dimension (band)
= sb_rast.squeeze()
sb_rast print(sb_rast.dims, sb_rast.coords, '\n')
# Drop the coordinates associated to band
= sb_rast.drop_vars('band')
sb_rast print(sb_rast.dims, sb_rast.coords, '\n')
FrozenMappingWarningOnValuesAccess({'band': 1, 'x': 870, 'y': 731}) Coordinates:
* band (band) int64 8B 1
* x (x) float64 7kB 1.213e+05 1.216e+05 ... 3.557e+05 3.559e+05
* y (y) float64 6kB 3.952e+06 3.952e+06 ... 3.756e+06 3.755e+06
spatial_ref int64 8B 0
FrozenMappingWarningOnValuesAccess({'x': 870, 'y': 731}) Coordinates:
band int64 8B 1
* x (x) float64 7kB 1.213e+05 1.216e+05 ... 3.557e+05 3.559e+05
* y (y) float64 6kB 3.952e+06 3.952e+06 ... 3.756e+06 3.755e+06
spatial_ref int64 8B 0
FrozenMappingWarningOnValuesAccess({'x': 870, 'y': 731}) Coordinates:
* x (x) float64 7kB 1.213e+05 1.216e+05 ... 3.557e+05 3.559e+05
* y (y) float64 6kB 3.952e+06 3.952e+06 ... 3.756e+06 3.755e+06
spatial_ref int64 8B 0
To get a good look at this raster data, without creating any new variables:
I will select the red, green, and blue variables (in that order) of the xarray.Dataset holding the Landsat data, convert it to an xarray.DataArray using the to_array() method, and then use .plot.imshow() to create an RGB image with the data. There will be a warning, that’s ok. I will adjust the scale used for plotting the bands to get a true color image.
The first plot will have the parameter set to be robust=False.
# Select red, green, and blue variables, stack them, and plot as an RGB image
'red', 'green', 'blue']].to_array().plot.imshow(robust=False) sb_rast[[
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
As we can see, this doesn’t turn out quite as we’d hoped. That’s because, by setting that parameter to False, we are not accounting for cloud cover. Their RGB values are outliers and cause the other values to be squished when plotting.
To account for this, I will use the robust = True parameter at the end of my code, in order to deal with the clouds:
# Select red, green, and blue variables, stack them, and plot as an RGB image
'red', 'green', 'blue']].to_array().plot.imshow(robust=True) sb_rast[[
This true color image gives us a visual that mostly resembles what we would expect to see with our human eyes looking down on Santa Barbara. The colors are what we would expect to see, and this can be useful for identifying landmarks. However, sometimes it is important to get a new perspective. For example, using this map, it is almost impossible to see the area in which the Thomas Fire burned. And that’s when we bring in…
4. False color image
To continue my analysis, and without creating any new variables, I will create a false color image by plotting the short-wave infrared (swir22), near-infrared, and red variables (in that order).
# Select the swir22, near-infrared, and red variables, stack them, and plot as a false color image
'swir22', 'nir08', 'red']].to_array().plot.imshow(robust=True) sb_rast[[
False color imagery, created using satellite data from instruments like Landsat, is a useful tool for monitoring wildfire impacts. By assigning infrared bands to visible colors, these images highlight vegetation health, burn severity, and the extent of fire scars. This approach helps researchers and land managers assess recovery efforts, identify high-risk areas, and plan restoration strategies.
5. Map
Finally, I will create a map showing the shortwave infrared/near-infrared/red false color image together with the Thomas Fire perimeter.
# Read in the Thomas fire perimeter we created
= os.path.join('data', 'thomas.geojson')
fp2 = gpd.read_file(fp2)
thomas_perim thomas_perim.plot()
This initial plot shows us that our perimeter file is looking good. It also shows us the perimeter that we will want to clip our raster file to, in order to create our map analyzing the burn area of the Thomas Fire.
First, we will have to ensure that our CRS’s match for these datasets…
# Examine CRss
print('Santa Barbara Raster CRS: ', sb_rast.rio.crs)
print('Thomas Fire Perimeter CRS: ', thomas_perim.crs)
Santa Barbara Raster CRS: EPSG:32611
Thomas Fire Perimeter CRS: EPSG:3857
# Reproject CRS of the Santa Barbara Raster
= sb_rast.rio.reproject("EPSG:3857")
sb_rast print('Matched CRS?', sb_rast.rio.crs == thomas_perim.crs)
Matched CRS? True
# Clip the sb_rast map to match the dimensions of the Thomas Fire Perimeter
= sb_rast.rio.clip_box(*thomas_perim.total_bounds) sb_fire
# Map our false color image with the fire boundary overlaid
= plt.subplots(figsize=(10, 10))
fig, ax 'swir22', 'nir08', 'red']].to_array().plot.imshow(ax=ax, robust=True)
sb_fire[[=ax, edgecolor="firebrick", linewidth = 2, label="Thomas Fire Perimeter")
thomas_perim.boundary.plot(ax"Thomas Fire: Burn Scars in False Color Imagery", fontsize=16)
ax.set_title(='upper right', fontsize=12)
ax.legend(loc
plt.show()
Our true color image was insufficient to reveal plainly the path of the Thomas Fire. However, by simply utilizing a false color composit, the path of the fire and the effect it had on the landscape are much more clear.
This final map clearly shows the burn scars from the fire, contained within the perimeter boundary we utilized to help us further identify the exact area effected.
Understanding air quality in the aftermath of a fire
For the final step of my analysis, I will make a plot to aid in visualization for the air quality in Santa Barbara as a result of this fire.
Complete Workflow
To run this analysis with maximum effiency, I will read in the data, clean it, and plot it all in one code chunk. The result will be a clear and easy to read plot that shows us air quality through time in the region of choice.
# Read and combine AQI data
= pd.concat([
aqi "https://aqs.epa.gov/aqsweb/airdata/daily_aqi_by_county_2017.zip", compression='zip'),
pd.read_csv("https://aqs.epa.gov/aqsweb/airdata/daily_aqi_by_county_2018.zip", compression='zip')
pd.read_csv(
])
# Clean data and filter for Santa Barbara, and index by date
= (aqi
aqi_sb =lambda x: x.strip().lower().replace(' ', '_')) # Simplify column names
.rename(columns"county_name == 'Santa Barbara'") # Filter for Santa Barbara
.query(=['state_name', 'county_name', 'state_code', 'county_code']) # Drop unneeded columns
.drop(columns=lambda x: pd.to_datetime(x['date'])) # Convert date column
.assign(date'date') # Set date as index
.set_index(
)
# Calculate and plot the 5-day rolling average AQI next to the daily value
"five_day_average"] = aqi_sb['aqi'].rolling('5D').mean()
aqi_sb[= aqi_sb[['aqi', 'five_day_average']].plot(
catch_aqi_plot ='line',
kind="Daily and Average AQI in Santa Barbara County",
title=['coral', 'seagreen']
color
)'Air Quality Index')
catch_aqi_plot.set_ylabel('Date')
catch_aqi_plot.set_xlabel(
# Update the legend labels
'Daily AQI', '5-Day Rolling Average'], loc='best') catch_aqi_plot.legend([
Calculating and adding the 5 day rolling average into our plot adds another dimension to the data. Though it lessens more extreme peaks, it shows us that even when averaging out, we can see clear spikes in the Air Quality Index. This helps account for outliers in the data, while still revealing trends.
Conclusion
Using Python in Jupyter Notebooks, I have successfully executed a series of analysis analyzing a fire that had clear and lasting effects on the regions of Santa Barbara and Ventura. This analysis also showcases the variety of data types that Python can handle, from spatial data to data visualization.
Citations:
AirNow. “Air Quality Index (AQI) Basics.” Accessed December 4, 2024. https://www.airnow.gov/aqi/aqi-basics/.
C. Galaz García, EDS 220 - Working with Environmental Datasets, Course Notes. 2024. [Online]. Available: https://meds-eds-220.github.io/MEDS-eds-220-course/book/preface.html
CAL Fire. “California Fire Perimeters (All).” Data.gov. Metadata created March 30, 2024, updated May 14, 2024. https://catalog.data.gov/dataset/california-fire-perimeters-all-b3436.
Hamm, Keith. “Closing Schools and Moving Finals Due to Thomas Fire: A Look at Our Education System’s Response to the Wildfire.” Santa Barbara Independent, December 13, 2017. https://www.independent.com/2017/12/13/closing-schools-and-moving-finals-due-thomas-fire/.
Microsoft Planetary Computer. Landsat Collection 2 Level-2 Atmospherically Corrected Surface Reflectance Data from Landsat 8 [Dataset]. Simplified for visualization and educational purposes. Accessed November 20, 2024. https://planetarycomputer.microsoft.com.
U.S. Environmental Protection Agency. “Air Data: Air Quality Data Collected at Outdoor Monitors Across the US.” Accessed December 4, 2024. https://www.epa.gov/outdoor-air-quality-data.
Citation
@online{jørgensen2024,
author = {Jørgensen, Bailey},
title = {Thomas {Fire} {Analysis}},
date = {2024-10-18},
url = {https://jorb1.github.io/posts/2024-12-4-thomas-fire-post/},
langid = {en}
}